Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[improve](move-memtable) reduce flush token num #46001

Merged
merged 2 commits into from
Dec 30, 2024

Conversation

kaijchen
Copy link
Contributor

@kaijchen kaijchen commented Dec 26, 2024

What problem does this PR solve?

Fix OOM due to too many flush tokens being created.
Reduce flush token num to 1 per tablet.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@Thearas
Copy link
Contributor

Thearas commented Dec 26, 2024

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@kaijchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TPC-H: Total hot run time: 32620 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit 28a2fa6ee05887f272c07583a85941c505bd1c74, data reload: false

------ Round 1 ----------------------------------
q1	17583	6136	6075	6075
q2	2057	317	186	186
q3	10372	1259	711	711
q4	10194	848	427	427
q5	7489	2169	1928	1928
q6	208	179	144	144
q7	885	743	604	604
q8	9236	1371	1115	1115
q9	5186	4937	4945	4937
q10	6783	2314	1853	1853
q11	461	278	265	265
q12	344	352	219	219
q13	17758	3634	3016	3016
q14	226	227	218	218
q15	562	498	508	498
q16	631	611	579	579
q17	568	845	320	320
q18	6967	6501	6462	6462
q19	1242	965	554	554
q20	309	330	190	190
q21	3096	2187	1998	1998
q22	366	330	321	321
Total cold run time: 102523 ms
Total hot run time: 32620 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6241	6270	6262	6262
q2	240	328	231	231
q3	2271	2681	2352	2352
q4	1475	1886	1410	1410
q5	4351	4733	4805	4733
q6	184	174	145	145
q7	2138	2024	1814	1814
q8	2570	2779	2631	2631
q9	7349	7186	7177	7177
q10	3052	3320	2828	2828
q11	573	550	486	486
q12	665	766	639	639
q13	3365	3839	3044	3044
q14	289	306	272	272
q15	562	503	502	502
q16	661	663	636	636
q17	1212	1731	1255	1255
q18	7660	7651	7174	7174
q19	768	1158	1030	1030
q20	1893	2003	1848	1848
q21	5535	5250	4924	4924
q22	620	632	567	567
Total cold run time: 53674 ms
Total hot run time: 51960 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 189928 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit 28a2fa6ee05887f272c07583a85941c505bd1c74, data reload: false

query1	954	375	360	360
query2	6518	2318	2306	2306
query3	6705	231	214	214
query4	33428	23779	23487	23487
query5	4340	632	475	475
query6	293	202	186	186
query7	4629	493	305	305
query8	317	246	271	246
query9	9772	2781	2772	2772
query10	488	299	237	237
query11	17950	15510	15124	15124
query12	155	106	101	101
query13	1666	541	432	432
query14	10249	6840	7334	6840
query15	224	201	184	184
query16	8250	618	434	434
query17	1540	752	567	567
query18	2091	413	304	304
query19	222	183	152	152
query20	124	115	114	114
query21	212	124	106	106
query22	4486	4355	4249	4249
query23	34907	33552	33419	33419
query24	6521	2321	2270	2270
query25	498	469	408	408
query26	1208	279	159	159
query27	2057	459	337	337
query28	5151	2501	2495	2495
query29	710	538	409	409
query30	231	186	151	151
query31	974	913	820	820
query32	81	62	58	58
query33	501	347	292	292
query34	757	839	511	511
query35	810	805	749	749
query36	1002	1058	937	937
query37	113	99	76	76
query38	4404	4152	4115	4115
query39	1488	1430	1425	1425
query40	209	118	102	102
query41	47	45	47	45
query42	125	105	155	105
query43	513	532	496	496
query44	1330	801	814	801
query45	181	178	162	162
query46	861	1048	639	639
query47	1930	1967	1901	1901
query48	379	409	313	313
query49	764	479	374	374
query50	635	664	389	389
query51	7211	7135	7167	7135
query52	102	100	89	89
query53	224	250	188	188
query54	470	479	407	407
query55	101	80	78	78
query56	241	259	245	245
query57	1218	1200	1152	1152
query58	245	225	242	225
query59	2955	3138	2858	2858
query60	269	266	250	250
query61	109	105	105	105
query62	849	772	725	725
query63	227	181	184	181
query64	4330	990	676	676
query65	3266	3183	3220	3183
query66	880	465	315	315
query67	16046	15782	15501	15501
query68	8503	774	519	519
query69	433	298	251	251
query70	1232	1176	1182	1176
query71	403	282	277	277
query72	5964	3881	3819	3819
query73	663	750	364	364
query74	10168	9266	8838	8838
query75	4346	3161	2659	2659
query76	4967	1195	789	789
query77	926	388	274	274
query78	10001	10272	9327	9327
query79	3301	898	605	605
query80	733	528	442	442
query81	461	276	228	228
query82	705	153	122	122
query83	193	165	139	139
query84	279	97	72	72
query85	785	347	296	296
query86	350	315	302	302
query87	4538	4448	4461	4448
query88	3465	2249	2198	2198
query89	432	340	295	295
query90	1888	189	182	182
query91	131	133	106	106
query92	64	55	61	55
query93	1925	891	533	533
query94	676	400	279	279
query95	325	263	246	246
query96	480	613	278	278
query97	2777	2884	2692	2692
query98	231	204	200	200
query99	1724	1578	1451	1451
Total cold run time: 294812 ms
Total hot run time: 189928 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.6 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit 28a2fa6ee05887f272c07583a85941c505bd1c74, data reload: false

query1	0.03	0.03	0.03
query2	0.07	0.03	0.03
query3	0.23	0.07	0.07
query4	1.61	0.10	0.10
query5	0.42	0.40	0.41
query6	1.18	0.65	0.65
query7	0.02	0.02	0.02
query8	0.04	0.03	0.02
query9	0.58	0.50	0.51
query10	0.55	0.54	0.55
query11	0.15	0.10	0.12
query12	0.14	0.10	0.11
query13	0.61	0.61	0.59
query14	2.84	2.70	2.77
query15	0.91	0.84	0.83
query16	0.38	0.37	0.39
query17	1.04	1.00	0.99
query18	0.23	0.22	0.21
query19	1.91	1.87	1.94
query20	0.01	0.01	0.01
query21	15.35	0.99	0.59
query22	0.74	0.72	0.71
query23	15.32	1.44	0.64
query24	2.92	1.00	1.45
query25	0.13	0.19	0.15
query26	0.25	0.15	0.14
query27	0.06	0.04	0.06
query28	14.15	1.52	1.05
query29	12.57	3.96	3.28
query30	0.25	0.10	0.06
query31	2.81	0.59	0.37
query32	3.23	0.54	0.47
query33	3.02	3.15	3.19
query34	16.78	5.09	4.55
query35	4.48	4.48	4.47
query36	0.66	0.50	0.49
query37	0.10	0.06	0.06
query38	0.05	0.03	0.03
query39	0.04	0.03	0.03
query40	0.17	0.13	0.12
query41	0.08	0.02	0.02
query42	0.03	0.02	0.02
query43	0.04	0.03	0.03
Total cold run time: 106.18 s
Total hot run time: 31.6 s

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.83% (10104/26020)
Line Coverage: 29.83% (85335/286037)
Region Coverage: 28.98% (43611/150512)
Branch Coverage: 25.51% (22237/87176)
Coverage Report: http://coverage.selectdb-in.cc/coverage/28a2fa6ee05887f272c07583a85941c505bd1c74_28a2fa6ee05887f272c07583a85941c505bd1c74/report/index.html

@kaijchen
Copy link
Contributor Author

run buildall

@doris-robot
Copy link

TeamCity be ut coverage result:
Function Coverage: 38.83% (10103/26020)
Line Coverage: 29.84% (85359/286039)
Region Coverage: 28.97% (43605/150512)
Branch Coverage: 25.51% (22235/87176)
Coverage Report: http://coverage.selectdb-in.cc/coverage/d78623b363841607506e70ce1808196e1594e06f_d78623b363841607506e70ce1808196e1594e06f/report/index.html

@doris-robot
Copy link

TPC-H: Total hot run time: 32898 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpch-tools
Tpch sf100 test result on commit d78623b363841607506e70ce1808196e1594e06f, data reload: false

------ Round 1 ----------------------------------
q1	17580	6131	6054	6054
q2	2052	298	165	165
q3	10445	1225	738	738
q4	10234	885	439	439
q5	8176	2182	1965	1965
q6	206	181	146	146
q7	903	766	602	602
q8	9234	1368	1211	1211
q9	5216	4999	4964	4964
q10	6773	2332	1886	1886
q11	477	284	256	256
q12	355	373	229	229
q13	18133	3595	2984	2984
q14	243	234	213	213
q15	565	505	501	501
q16	645	628	600	600
q17	583	843	332	332
q18	7244	6735	6542	6542
q19	2837	981	562	562
q20	298	315	185	185
q21	2859	2199	2021	2021
q22	361	326	303	303
Total cold run time: 105419 ms
Total hot run time: 32898 ms

----- Round 2, with runtime_filter_mode=off -----
q1	6348	6412	6621	6412
q2	240	324	233	233
q3	2284	2716	2306	2306
q4	1463	1815	1374	1374
q5	4369	4984	5093	4984
q6	195	179	145	145
q7	2191	2077	1962	1962
q8	2734	2933	2741	2741
q9	7642	7527	7423	7423
q10	3043	3359	2924	2924
q11	602	505	505	505
q12	636	766	599	599
q13	3503	3801	3179	3179
q14	291	325	272	272
q15	571	533	510	510
q16	662	675	651	651
q17	1204	1754	1274	1274
q18	7643	7568	7271	7271
q19	868	1106	1114	1106
q20	1974	2059	1904	1904
q21	6356	5488	4903	4903
q22	627	598	560	560
Total cold run time: 55446 ms
Total hot run time: 53238 ms

@doris-robot
Copy link

TPC-DS: Total hot run time: 196340 ms
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/tpcds-tools
TPC-DS sf100 test result on commit d78623b363841607506e70ce1808196e1594e06f, data reload: false

query1	1324	993	908	908
query2	6485	2435	2329	2329
query3	11140	4762	4841	4762
query4	33327	23587	23198	23198
query5	4098	623	494	494
query6	276	190	169	169
query7	3985	494	298	298
query8	289	245	231	231
query9	9297	2762	2749	2749
query10	465	306	260	260
query11	17985	15211	15126	15126
query12	154	110	100	100
query13	1581	546	401	401
query14	10433	7317	7474	7317
query15	237	220	188	188
query16	8147	635	492	492
query17	1556	816	595	595
query18	2133	426	333	333
query19	214	210	180	180
query20	129	118	123	118
query21	213	126	127	126
query22	4655	4552	4507	4507
query23	34572	33375	33745	33375
query24	6502	2313	2386	2313
query25	479	444	387	387
query26	878	269	159	159
query27	2074	481	359	359
query28	5504	2511	2481	2481
query29	631	567	418	418
query30	216	186	149	149
query31	963	923	840	840
query32	72	59	56	56
query33	482	368	322	322
query34	779	876	529	529
query35	791	840	777	777
query36	1057	1065	965	965
query37	113	93	75	75
query38	4254	4225	4051	4051
query39	1540	1467	1456	1456
query40	219	115	103	103
query41	50	46	47	46
query42	121	103	97	97
query43	516	523	496	496
query44	1352	819	856	819
query45	187	182	182	182
query46	877	1056	664	664
query47	1999	2061	1973	1973
query48	386	414	340	340
query49	715	483	378	378
query50	648	655	422	422
query51	7355	7160	7344	7160
query52	110	107	91	91
query53	230	252	192	192
query54	485	518	409	409
query55	82	80	82	80
query56	265	267	242	242
query57	1225	1247	1193	1193
query58	236	222	227	222
query59	3173	3201	3214	3201
query60	291	273	261	261
query61	108	106	112	106
query62	868	819	759	759
query63	240	200	193	193
query64	3348	1024	677	677
query65	3363	3270	3320	3270
query66	854	404	307	307
query67	16425	15789	15474	15474
query68	9974	778	536	536
query69	495	297	250	250
query70	1213	1165	1184	1165
query71	459	284	257	257
query72	6276	3872	3819	3819
query73	1143	759	373	373
query74	10216	8972	8720	8720
query75	4527	3148	2593	2593
query76	5577	1194	789	789
query77	1023	357	279	279
query78	9961	10137	9436	9436
query79	3766	876	608	608
query80	730	517	437	437
query81	492	268	240	240
query82	565	145	124	124
query83	199	160	146	146
query84	279	96	68	68
query85	819	371	301	301
query86	340	305	288	288
query87	4750	4587	4371	4371
query88	3267	2265	2233	2233
query89	455	331	313	313
query90	2102	245	185	185
query91	137	142	105	105
query92	67	54	57	54
query93	2301	914	550	550
query94	665	387	288	288
query95	325	259	254	254
query96	495	601	291	291
query97	2744	2809	2693	2693
query98	220	227	202	202
query99	1681	1555	1422	1422
Total cold run time: 302021 ms
Total hot run time: 196340 ms

@doris-robot
Copy link

ClickBench: Total hot run time: 31.28 s
machine: 'aliyun_ecs.c7a.8xlarge_32C64G'
scripts: https://github.com/apache/doris/tree/master/tools/clickbench-tools
ClickBench test result on commit d78623b363841607506e70ce1808196e1594e06f, data reload: false

query1	0.03	0.03	0.05
query2	0.07	0.04	0.03
query3	0.24	0.08	0.07
query4	1.61	0.11	0.10
query5	0.43	0.43	0.41
query6	1.15	0.65	0.64
query7	0.02	0.02	0.02
query8	0.04	0.03	0.03
query9	0.58	0.51	0.50
query10	0.57	0.57	0.55
query11	0.15	0.10	0.10
query12	0.14	0.11	0.11
query13	0.62	0.62	0.59
query14	2.77	2.75	2.72
query15	0.90	0.82	0.82
query16	0.38	0.37	0.39
query17	1.08	1.01	0.97
query18	0.22	0.21	0.21
query19	1.96	1.80	2.00
query20	0.02	0.01	0.02
query21	15.38	0.93	0.59
query22	0.75	0.76	0.61
query23	15.35	1.40	0.57
query24	2.68	0.91	1.73
query25	0.20	0.19	0.18
query26	0.20	0.14	0.14
query27	0.06	0.05	0.05
query28	14.00	1.51	1.06
query29	12.56	3.90	3.32
query30	0.26	0.11	0.07
query31	2.80	0.60	0.38
query32	3.23	0.55	0.46
query33	3.09	3.13	3.10
query34	16.84	5.14	4.50
query35	4.55	4.46	4.48
query36	0.67	0.49	0.48
query37	0.09	0.06	0.06
query38	0.04	0.04	0.03
query39	0.04	0.02	0.02
query40	0.18	0.14	0.13
query41	0.07	0.03	0.03
query42	0.03	0.02	0.03
query43	0.04	0.03	0.03
Total cold run time: 106.09 s
Total hot run time: 31.28 s

Copy link
Contributor

@liaoxin01 liaoxin01 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Dec 27, 2024
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@sollhui sollhui left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@liaoxin01 liaoxin01 merged commit d139934 into apache:master Dec 30, 2024
25 of 27 checks passed
github-actions bot pushed a commit that referenced this pull request Dec 30, 2024
Fix OOM due to too many flush tokens being created.
Reduce flush token num to 1 per tablet.
liaoxin01 pushed a commit that referenced this pull request Jan 3, 2025
liaoxin01 pushed a commit to liaoxin01/doris that referenced this pull request Jan 13, 2025
Fix OOM due to too many flush tokens being created.
Reduce flush token num to 1 per tablet.
liaoxin01 added a commit that referenced this pull request Jan 14, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/2.1.8-merged dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants